Statistics and Probability Primer for Computational Biologists
نویسندگان
چکیده
ii Introduction Introduction Why do you need to know the theory behind statistics and probability to be proficient in computational biology? While it is true that software to analyze biological data often comes prepackaged and in a working form (sometimes), it is important to know how these tools work if we are able to interpret the meaning of their results properly. Even more important, often the software does not currently exist to do what we want, so we are forced to write algorithms ourselves. For this latter case especially, we need to be intimately familiar not only with the language of statistics and probability, but also with examples of how these approaches can be used. In this primer, our goal is to convey the essential information about statistics and probability that is required to understand algorithms in modern computational biology. We start with an introduction to basic statistical terms such as mean and standard deviation. Next we turn our attention to probability and describe ways that probabilities can be manipulated and interconverted. Following this, we introduce the concept of a discrete distribution, and show how to derive useful information from this type of model. As an extension of discrete distributions we then introduce continuous distributions and show how they can be used. Finally, we then discuss a few of the more advanced tools of statistics, such as p-values and confidence intervals. Appendix A and B cover optional material that related to more complex probability distributions and a discussion of Bayesian networks. If you are already comfortable with basic statistical terms, such as mean and variance, then feel free to skip the first chapter. In each section, our goal is to provide both intuitive and formal descriptions of each approach. To aid in providing a more intuitive understanding of the material, we have included a number of worked out examples that relate to biologically relevant questions. In addition, we also include example problems at the end of the chapter as an exercise for the reader. Answers to these problems are listed in Appendix C. Working the problems at the end of each chapter is essential to ensure comprehension of the material covered. More difficult optional problems that may require significant mathematical expertise are marked with a star next to their number.
منابع مشابه
A continuous approximation fitting to the discrete distributions using ODE
The probability density functions fitting to the discrete probability functions has always been needed, and very important. This paper is fitting the continuous curves which are probability density functions to the binomial probability functions, negative binomial geometrics, poisson and hypergeometric. The main key in these fittings is the use of the derivative concept and common differential ...
متن کاملA computational image analysis glossary for biologists.
Recent advances in biological imaging have resulted in an explosion in the quality and quantity of images obtained in a digital format. Developmental biologists are increasingly acquiring beautiful and complex images, thus creating vast image datasets. In the past, patterns in image data have been detected by the human eye. Larger datasets, however, necessitate high-throughput objective analysi...
متن کاملFinding your inner modeler: An NSF-sponsored workshop to introduce cell biologists to modeling/computational approaches.
In classical Cell Biology, fundamental cellular processes are revealed empirically, one experiment at a time. While this approach has been enormously fruitful, our understanding of cells is far from complete. In fact, the more we know, the more keenly we perceive our ignorance of the profoundly complex and dynamic molecular systems that underlie cell structure and function. Thus, it has become ...
متن کاملA primer on the use of probability generating functions in infectious disease modeling
We explore the application of probability generating functions (PGFs) to invasive processes, focusing on infectious disease introduced into large populations. Our goal is to acquaint the reader with applications of PGFs, moreso than to derive new results. PGFs help predict a number of properties about early outbreak behavior while the population is still effectively infinite, including the prob...
متن کاملStatistics Online Computational Resource for Education.
The Statistics Online Computational Resource (www.SOCR.ucla.edu) provides one of the largest collections of free Internet-based resources for probability and statistics education. SOCR develops, validates and disseminates two core types of materials - instructional resources and computational libraries.
متن کامل